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ABSTRACT 

Web search engines and specialized online verticals are in- 
creasingly incorporating results from structured data sources 
to answer semantically rich user queries. For example, the 
query 'Samsung 50 inch led tv' can be answered using in- 
formation from a table of television data. However, the users 
are not domain experts and quite often enter values that do 
not match precisely the underlying data. Samsung makes 
46- or 55- inch led tvs, but not 50-inch ones. So a literal 
execution of the above mentioned query will return zero re- 
sults. For optimal user experience, a search engine would 
prefer to return at least a minimum number of results as 
close to the original query as possible. Furthermore, due to 
typical fast retrieval speeds in web-search, a search engine 
query execution is time-bound. 

In this paper, we address these challenges by proposing 
algorithms that rewrite the user query in a principled man- 
ner, surfacing at least the required number of results while 
satisfying the low-latency constraint. We formalize these re- 
quirements and introduce a general formulation of the prob- 
lem. We show that under a natural formulation, the problem 
is NP-Hard to solve optimally, and present approximation 
algorithms that produce good rewrites. We empirically val- 
idate our algorithms on large-scale data obtained from a 
commercial search engine's shopping vertical. 

1. INTRODUCTION 

Web users are increasingly looking for information be- 
yond the traditional sources. This is manifested in search 
engines like google and bing by the inclusion of answers 
beyond 10 page links and in the tremendous growth of spe- 
cialized search engines such as amazon. Often the rich ex- 
perience is provided via the use of semantic information 
that comes from (semi-)structured data sources in the form 
of tables, xml files or databases. For example, structured 
data can be used to answer queries ranging such as elec- 
tronic goods (e.g. '50 inch Samsung led tv'), fashion (e.g. 
'$1600 prada handbags'), movie-showtimes listings (e.g. 'avat 
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showtimes near san francisco'), and weather prediction 
(e.g. 'weather in new york'). 

A major challenge in using structured data to answer web 
queries is that users often lack domain expertise and may 
pose queries that lead to very few or no result due to un- 
familiarity with the underlying data sources. For example, 
consider the query '50 inch Samsung led tv'. There ex- 
ists work in the literature [171 121j that can correctly classify 
and semantically interpret the query to attribute- value pairs 
that correspond to underlying structured attributes. So the 
query can be thought as (50 inch display size, Sam- 
sung Brand, led tv ^ display type). However, if the 
query is directly evaluated as specified, there will be no re- 
sults that can satisfy the interpretation as Samsung does 
not make 50-inch LED TVs. On the other hand, Samsung 
makes 46-inch and 55-inch LED TVs and 50-inch PLASMA 
TVs. Arguably, the users would prefer to see such results 
that are close to their original query instead of looking at 
an empty page with no results because they did not know 
the appropriate precise values when typing the query. 

The challenge is common to today's systems and not re- 
stricted to the electronics domain but applies broadly to 
answering web queries in a variety of domains including 
handbags or shoes, for example consider the query $1600 
prada handbags. One strategy for handling this challenge 
is to rewrite the query to broaden its coverage. In the con- 
text of online search, such rewrites include a variety of tech- 
niques such as query term deletion, phrasal substitution, and 
mining of similar queries. In fact, the query $1600 prada 
handbags does not return any products on amazon and is 
handled using term deletion as shown in Figure [1] However 
this approach provides no quality guarantees and does not 
take advantage of the rich meta-data information available 
in structured data sources, thus producing results that leave 
a lot to be desired to the user. 

We are interested in rewriting the queries through se- 
mantic term expansion. For example, the above queries 
may be rewritten as ' (46 to 52 inch) (samsung or sony) 
(led or plasma) tv' and ' ($1400 to $1800) (prada or guc 
handbags' respectively. This query rewriting problem can be 
viewed as a generalization of query rewriting through syn- 
onyms to increase recall, for example, from 'women shoes' 
to ' (women or women's) (footwear or shoes)'. Note that 
we are not interested in a set of static rewrite rules, such as 
those used in synonym detection and stemming, but rather 
a query rewrite algorithm that can understand the query 
intent and adapt accordingly. 

The quality of the rewrites depends on two factors. First, 
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Your search "$1600 prada handbags" did not match any products. 
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Figure 1: The query $1600 prada handbags on ama- 
zon.com is handled by dropping terms in the query 
successively and surfacing the results from each 
rewritten query separately 



the rewritten query should preserve the meaning of the orig- 
inal query as closely as possible. We measure the fidelity of 
the rewrite by computing how far away the set of results re- 
trieved are to the original query, as estimated by user pref- 
erences learned through click logs. Second, the rewritten 
query should ensure there are sufficiently many results re- 
turned to the user. We measure the coverage of the rewrite 
by counting how many times when a certain number of re- 
sults is requested, the expectation is met. 

If efficiency had not been an issue, a candidate solution 
would have been to expand the terms a little at a time, is- 
sue the rewritten query to the index, and repeat as necessary 
until the minimum number of results requested is retrieved. 
This solution would not be applicable to web search, how- 
ever, as users expect results to be returned in under half 
a second, thus placing a strict performance requirement on 
the query rewriting component. Hence, many search en- 
gines place a restriction on the number of re-written queries 
(also called query augmentations) that can be issued to the 
index as part of the original query execution. To ensure 
this requirement is met, we require that the techniques may 
only use precomputed statistics of the index but may not 
access the index at run time, as index access contributes the 
lion's share of running time. Similarly, we also require that 
the techniques can take an input parameter that limits the 
number of alternative rewrites they can examine. 

In this paper, we formulate the above problem as time- 
bound query rewriting for structured web queries. As part 
of our contributions we formally describe an optimization 
framework that takes as input a candidate query q, a desired 
number of results k, and a parameter T that governs how 
many rewrites can be considered, and produces a rewritten 
query that aims to retrieve at least k results and that the 



results match the original query well. We show that find- 
ing the optimal solution to this problem is NP-Hard. We 
introduce a greedy algorithm and a dynamic programming 
solution that rewrite the query in a principled and controlled 
fashion. We also study the effect of functional dependencies 
in the data and how they affect query rewrite. We evaluate 
the proposed solution using real queries from a commercial 
search engine's shopping vertical against a prototype com- 
merce search engine. 

The rest of the paper is organized as follows. In Section 
2, we discuss related work. In Section 3, we describe our 
model and assumptions about structured web search, and 
formulate the problem of predicate relaxation. To meet the 
performance requirement, one needs to pre-compute statis- 
tics on the database to be used at runtime. In Section 4, we 
describe two kinds of statistics — histograms and functional 
dependencies — and give two heuristics for using these statis- 
tics to perform fast predicate relaxation. In Section 5, we 
report our experimental evaluation of these heuristics con- 
ducted over data from a commercial search engine's vertical. 
We summarize and conclude in Section 6. 

2. RELATED WORK 

Structured data is abundant on the web, and there have 
been studies on how to retrieve them in a manner suitable 
to web search [71 |H] . There is also work on how to retrieve 
and rank information from structured data |12l |9l [6l 1151 
118) . When answering web queries over structured data, how- 
ever, direct application of textual similarity may produce 
low quality results due to possible misinterpretations of data 
types. For example, a database might store the television 
diagonal as the string '50 inches' while users may type '50'". 
To this end, recent work has studied how to analyze keyword 
queries as typed in a web search box and interpret them as 
structured queries [171 121[ . These past works form the basic 
components over which we build our system for answering 
web queries using structured data. 

Rewriting user queries to broaden coverage is a common 
technique employed by all search engines. For example, 
search engines routinely make spelling corrections to queries 
when retrieving results. In the context of search over struc- 
tured data sources, textual similarity approaches that treat 
the query as a bag of words will generally perform poorly. In 
the example query given in the Introduction, there is no tex- 
tual relaxation between Samsung and Sony, and little can be 
done for generating term expansions or substitutions for the 
diagonal size in a controlled manner. Past approaches based 
on log mining [51 ll3in3] may be able to discover relationship 
between terms that do not exhibit textual similarity, but 
they do not address how such knowledge can be exploited 
in conjunction with statistics of the documents to come up 
with good rewrites of the queries that preserve fidelity and 
ensure coverage. 

Fontoura et al. proposed a method to relax text queries 
using taxonomies [ID]. Their approach can also be viewed 
as rewriting queries taking advantage of a taxonomy created 
by experts, and thus solves a similar problem to ours. How- 
ever, creating a good taxonomy requires significant domain 
knowledge, and is an expensive process. In our application 
domain, we do not have such a taxonomy available, and 
hence the work is not directly comparable. 

These has been work in the database community that 
investigate the problem of keyword search over structured 
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data [HI [9l 1121 1151 118| . They assume that the expansion of 
the keywords is handled through some probabiUstic meth- 
ods or captured in the ranking function, and focus on per- 
formance issues. Like these work, we are concerned about 
performance issues, and capture the requirements by exphc- 
itly specifying them in our framework; our work is different 
in that it allows a more controlled behavior in the rewriting 
that provides quality guarantees. 

Finally, given a query and a distance function, one can 
think of the problem we are trying to solve as a nearest 
neighbor problem. Nearest neighbor problems have been 
studied in the past, for example ^20,. More recently there 
are even k- nearest neighbor considerations, like [5] 1251 [4l 1231 
1221 124) . that are applicable in the setting of searching over 
a database. Similar to the k-nearest neighbors, but from a 
join relaxation problem in databases is the work described 
in [16| . We find such work very valuable in relaxing the user 
query and finding good quality results within a reasonable 
distance around what was specified in the query. Although 
useful, the techniques described have a fundamental differ- 
ence with our work. In the web search over structured data 
setting we need both quality guarantees with regards re- 
laxation but at the same time we have strict performance 
guarantees requiring an upper time bound. Further, these 
approaches admit relaxations of numeric attributes only and 
extensions to categorical attributes are non-trivial. In con- 
trast, our approaches come with two advantages - 1) they are 
very simple to implement; and 2) support distance functions 
on both categorical and numeric attributes. In fact, we will 
precisely use one such distance function in our experiments 
and show that our algorithms perform well in practice. 

3. PROBLEM FORMULATION 

We first describe a model of structured web queries, and 
assumptions on how they are parsed, and how items are eval- 
uated with respect to the parsed queries. We then formally 
define the problem of time bound query rewrites. 

3.1 Model 

Given a keyword web query, we assume the existence of 
a semantic parser that identifies the attributes requested 
in the query and extracts their associated desired values, 
based on past work such as [171 I21| . For example, the 
query '50 inch Samsung led tv' is parsed as a structured 
query {tableiTV, brand: Samsung, typeiLED, diagonal: 50}. 
Denote a generic parsed query by its attribute-value pairs, 
q = {ai : 111,02 : V2,...,am '■ Wm}. Denote the value of 
attribute Ui in query q by ga^. For our example query, 
<?brand = Samsuug. Cousisteut with the interpretation of 
web queries as conjunctions of keywords, we interpret the 
structured query under the AND-semantics as well. For the 
rest of the paper we assume that structured queries are given 
to us in the form of attribute-value pairs. In practice, not 
all terms in a query will be understood by the parser. The 
terms that are not understood are treated as keywords used 
by the ranking function as additional signals. 

Let P be a database of items, from which we retrieve 
results to serve the query. For each item p £ P, we rep- 
resent it as a set of attribute- value pairs {ai : wi,a2 : 
W2, ■ ■ ■ ,a„ : w„}, and the value of attribute Ui by pai- We 
assume that the semantic parser will only identify attributes 
for which we have data, hence the query specifies the values 
of a subset of these n attributes. Henceforth, when a query 
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Table 1: Example database for TVs. 
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Table 2: Example distance function for TVs. 



{ai : Vi,a2 : V2, ■ ■ ■ ,am ■ Vm} is given, we only focus on 
the m attributes mentioned. We give an example table of 
the database for TVs in Table [U which we use throughout 
the paper for illustration. The size of the database will be 
significantly larger in practice. 

As discussed in the Introduction, users may lack domain 
expertise and may be unfamiliar with the attribute spec- 
ification of the underlying structured data. Consider the 
sample query {table:TV, brand: Samsung, type:LED, diago- 
nal:50}. For the database table in Table [T] there is no TV 
that matches all the requested attribute values. Nonetheless, 
it is desirable that a search engine should return results that 
are close to the query, for example, Samsung LED TVs of 
46 inches or 55 inches, or Sharp LED TVs of 52 inches. It 
would be less desirable, however, if the search engine returns 
a Samsung LED TV of 32 inches, since that TV is much 
smaller than requested, or a Sony CRT TV of 50 inches, 
since the type of TV is significantly different than requested. 

To make the discussion formal, denote the domain of at- 
tribute ai by Ai. Let the function di : Ai x Ai ^ [0, 1], 
di(v,w) measures the distance of attribute value w from 
attribute value v. When di{v,w) is small, it means that 
attribute value w is similar to attribute value v. We give 
an example distance function for TVs in Tabled We note 
that our solution does not depend on assumptions such as 
symmetry or triangle- inequality about the distance function. 

An aggregate distance function ad : P x Q — > R, ad{p, q) 
measures how well item p matches query q. When ad(p, q) is 
small, it means item p matches the query q well. We assume 
that the function depends only on the attribute values of the 
item and the query. We next define a basic yet fundamen- 
tal property of aggregate distance functions that we assume 
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throughout the paper. 

Definition! (Monotonicity). An aggregate distance 
function, ad{-), is monotonic if for any query q = {ai : 
vi,a2 '■ V2, . . . , am '■ Vm.}, any two items and , if 

VI < i < m, di{vi,pl^) > dt{vi,pl.) , 

then ad{p^,q) > ad{p^,q). 

Monotonicity ensures that an item closer to the query in 
each of the attributes wiU also be closer to the query in ag- 
gregate distance. This is a natural property that should be 
satisfied when the attribute distances determine how well 
an item matches a query. Example aggregate distance func- 
tions that satisfy monotonicity includes weighted sums of 
the attribute distances, and ^p-norms that treat attribute 
distances as vectors in m-dimensional space. 

It is possible that a search engine may choose a rank- 
ing function that does not satisfy monotonicity. This hap- 
pens when the ranking function takes into account addi- 
tional sources of signals such as click activities in deciding 
how well an item matches a query. This is outside of the 
scope of our problem formulation. 

3.2 Query Rewrite Formulation 

Given a query q, our ultimate goal is to find the top-fc 
items that match the query within a fixed time window. 
In order to find the top-fc items, we must first be able to 
select at least fc items from the database. This may not 
be possible when there are less than fc items that match all 
the desired attribute values. The focus of our work is on 
how to rewrite the query in a principled way so as to ensure 
sufficiently many items are returned, keeping fidelity to the 
original query, while respecting the time constraints imposed 
on the individual components of a search engine. 

For a given attribute- value pair ai : v and value 5 G [0, 1], 
let Bi(v,5) be the set of attribute values that is 5-close to 
V, i.e., 

B,{v,5) = {w£ A,\d^{v,w) < S} . (1) 

In other words, Bi{v,0) are the set of attribute values that 
are equivalent to v, whereas Bi{v, 1) are the set of all at- 
tribute values. For example, for the distance function in 
Table H Bd('Samsung', 0.2) = {'Samsung','Sony'}. 

Denote a relaxed query q by {ai : vi ± Si,a2 V2 i 
S2, ■ ■ ■ ,am ■ Vm i 5m.}- A database item p matches q if and 
only if 

VI < i < m, pai e Bi{vi, Si) . 

At a high level, the query rewrite for structured web query 
problem is to take an input query and find a relaxed query 
that will result in at least fc matches in the database. If 
time had not been an issue, a simple solution would be to 
iteratively make small relaxation to the query, issue it to the 
database to find out the number of matches, and repeat until 
we have found fc results. However, due to the performance 
requirement imposed by web search, this approach is infea- 
sible as database access is costly. Indeed, an algorithm may 
only be able to carry out a small amount of computations 
within the time envelope. 

To capture these limitations, we propose to bound the 
time of any solution by the number of different relaxed queries 
it considers and include this as an explicit parameter to the 



problem specification. To ensure that this meaningfully re- 
flects the performance requirement and is helpful in differen- 
tiating among solutions, we require that the amount of time 
it takes to evaluate each relaxed query to be constant. Note 
that different forms of evaluating the relaxation will lead to 
different classes of problems. For example, evaluation via 
issuing the relaxed query to a database will constitute a dif- 
ferent class of problems from evaluation via approximation 
by database statistics. Indeed, in this paper, we focus on 
the latter form of evaluation, which we made clear in Sec- 
tion [l] We model the performance requirement using this 
abstract bound in place of an actual time parameter as the 
actual amount of time needed varies across systems and is 
dependent on the quality of the implementation. 
We now give a formal definition of the problem. 

Definition 2 (Time Bound Query Rewrite). Given: 

• A query q = {ai : Vi,a2 : V2, . . . ,am ■ Vm}; 

• A database of items P — . . . ,p"}; 

• The minimum number of items to return, k; 

• The maximum number of relaxed queries considered, 
T. 

Find a relaxed query q' = {ai : Ui ±(5i, 02 : i'2 ±<52, . . . , Q.m : 
VmiSm} with at mostT relaxed queries considered, such that 
the number of items that match the query q , S P, is at 
least k, and that the average aggregate distance among all 
items in S from query q, 

ad{S,q) = ■r^^ad(p,g) , (2) 

' ' pes 

is minimized. 

4. STATISTICS AND HEURISTICS 

To enable fast evaluation of candidate relaxed queries, 
one can precompute statistics on the database, and esti- 
mate the number of matches using these statistics. We 
consider two statistics — histograms of attribute values and 
attribute dependencies estimated as conditional probability 
distributions — which are commonly computed in databases, 
and formulate a version of time bound query rewrite prob- 
lem. We then present two heuristics, one based on a greedy 
approach, and another based on dynamic programming, and 
discuss trade-offs between the two approaches. 

4.1 Statistics 

4.1.1 Histograms 

One of the most important statistics of an attribute is 
the distribution of its values, termed the histogram. His- 
tograms can help to provide estimate of the number of po- 
tential matches to a query without direct database access. 

Formally, let the histogram of attribute Oi be hi , and that 
for a set of attribute values V Ai, hi{V) returns the num- 
ber of items that have the corresponding attribute value. 
For example, the histogram for the brand attribute in our 
example database would be 

ftbCSamsung') — 5 /i(,('Sony') — 3 /i(,('Sharp') = 2 . 

If one assumes that the attributes in the query are in- 
dependent, one can estimate the number of matches to a 
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relaxed query as follows. For a query q = {ai : Vi±5i,a2 : 
«2 ±(52, • • • , am '■ VmiSm}, the estimated number of matches 
equals 



EST(g) = \p\ n 



hi{Bi{vi,Si)) 



(3) 



As an example, for q — {brand=Samsung ±0.2, type=LED 
±0.2, diagonal=50 ±0.3 }, 



EST(g) =10(^ 



fe6(B,,CSamsung',0.2)) w fet(Bt('LED',0.2)) 

10 /V 10 

/id(Bd(50,0.3)) 



10 

=10(0.8)(0.8)(0.7) = 4.48 

When attributes are dependent, the estimate could be 
misleading. Functional dependencies may help to improve 
the estimate. 

The maximum aggregate distances of the set of selected 
items to a query cannot be determined by the histograms 
alone. Therefore, one cannot directly optimize objective 0. 
Instead, we focus on bounding the aggregate distance by 
controlling the total amount of relaxation, and define the 
problem of query rewrite using histograms as follows. 

Definition 3 (Query- Rewrite-Histograms). Given 

• A query q = {ai : Vi,a2 : W2, . • • , : Wm}; 

• Database size \P\; 

• Histograms hi for each attribute ai; 

• The rmnimum number of items to return, k; 

• The maximum number of relaxed queries considered, 
T. 

Find a relaxed query q' — {ai : ±5i, 02 : W2 ±^2, . . • , fflm : 
«m±<5m} with at mostT relaxed queries considered, such that 
EST(g') IS at least k, and that the total amount of relaxation. 



tr{q')=J2^^ 



(4) 



is minimized. 



Later in this section, we show that this problem is hard 
(even in the absence of a limit on the number of relaxed 
queries considered), and propose heuristics for solving this 
problem. 

4.1.2 Attribute Dependencies 

Suppose a query specifies both a brand and a model. Con- 
sider the example database in Table [1] If only one of the 
two attributes is relaxed, there will be no additional matches 
for the relaxed query. Yet the estimate using Equation (|3]), 
based on the assumption that attributes are independent, 
would erroneously estimate that the number of matches will 
increase after the relaxation. To address this problem, one 
has to account for attribute dependencies in the database. 

We start by precomputing the conditional probabilities 
P{ai — Vi\aj — Vj) for all pairs of attributes ai and Oj in 
the database. For query q — {ai : ui, 02 : V2, ■ . ■ , a,n : Vm}, 
if P{ai — Vi\aj — Vj) is higher than some threshold, this 
indicates that the attributes are dependent, and we propose 



to drop either Oi or Oj from the query. We believe there are 
good arguments for either approach to perform better; it de- 
pends on whether we have a better distance function for at- 
tribute ai or Oj . To test the effect of attribute dependencies, 
we evaluated both possible directions in our experiments. 

After this preprocessing step, we apply the same tech- 
niques for Query-Rewrite-Histograms on the modified 
instance. It may be possible to use the conditional probabil- 
ities in a finer-grained manner to further improve the query 
rewriting process; we leave that for future work. 

4.2 Hardness of Query- Rewrite-Histograms 

The problem of Query- Rewrite-Histograms is closely 
related to knapsack problems, and is hard to solve optimally. 

Theorem 1. Query-Rewrite-Histograms js A'^P-Ziarrf, 
even m the absence of a bound on the maximum number of 
relaxations considered. 

Proof. We reduce Subset-Product, an NP-hard prob- 
lem, to a decision version of Query-Rewrite-Histograms(c) 
where we ask if there exists a relaxation for which tr{q') < c. 

The Subset-Product (SP14, [11]) is as follows. Given a 
finite set A, a size s(a) £ Z'^ for each a £ A, and a positive 
integer B, is there a subset A' (- A where HaGA' ^('^) ~ ^■ 

We create an instance of Query- Rewrite-Histograms(c) 
as follows. We map each element in the finite set A to an 
attribute. Create a query q = {ai : vi,a2 : V2, . . . , am '■ Vm}, 
where m = \A\, and a database of size \D\ > maxaGAs(a), 
the latter serves as a normalization constant for our prob- 
lem. For each item a £ A with size s(a), create a histogram 
for attribute a with 



ha{Ba{Va,t)) = 



1 for < t < log s(a) 
s(a) for t > log s(a) 



Set k = B /\D\"^~^ , and the decision parameter c — logS. 

The instance of Query-Rewrite-Histograms(c) eval- 
uates to YES if and only if there exists a relaxed query 
q' = {ai : vi ± Si,a2 : V2 ± S2, . . . , am : Vm ± 5m} satisfying 



nil I \ s(a) 
, , i:OA —loo: St a) / ^ 

EST(g) = . _ ^ / > 



B 



tr{q) 



|73)|m-l - |£)|m-l 

l°g < c = log B , 



2:5^— log s{a) 



which is possible only if ni:5i=iog s(a) ^(o) = 

One loose end remains is that the exact values logs (a) 
and log-B are not representable in finite number of digits. 
We need to show that the reduction continues to hold after 
rounding these input to some precision e, and that log(l/e) 
is polynomial in the size of the Subset-Product instance. 
When log s(a) and log B can have at most an error of e, for 
tr(q) to be smaller than logB but not log(_B + 1), we need 

(log B + e) + ne< (log(B + 1) - e) 

(n + 2)e < log((B + 1)/B) < 1/B 
e< l/((n + 2)B) , 

or log(l/e) = 0(logn_B), as desired. □ 

Therefore, in order to solve the problem, we rely on heuris- 
tical approaches for solving the problem. 
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0.1 


0.3 


5 


8 


7 


2.80 


6 


0.2 


0.1 


0.3 


8 


8 


7 


4.48 



Table 3: Greedy-Rewrite with e = 0.1. 



4.3 Algorithms for Query Rewrite 

4.3.1 Greedy-Rewrite 

A general template for solving Query- Rewrite-Histograms 
is to (1) select an attribute based on some criteria, (2) relax 
it by a small amount e to get relaxed query q, (3) compute 
the estimate EST(g), and (4) repeat as long as EST(g) < k. 
Different choices of selection criteria give rise to different 
heuristics. 

In Greedy-Rewrite, we select an attribute to relax based 
on how constraining the attribute is. Formally, for a relaxed 
query g = {ai : vi ± 5i, a2 : V2 ± 52, . . . , am ■ Vm ± Sm}, we 
pick the most constraining attribute, at where 

hi{Bi{vi,Si)) 

is the smallest to relax. 

As an example, consider again the query q = {tableiTV, 
brand: Samsung, typeiLED, diagonal: 50}, the target number 
of results be 3, and the maximum number of relaxed queries 
considered be T = 10. Let the step size e = 0.1 for all at- 
tributes. Greedy-Rewrite will proceed as in Tabled At 
termination, it returns the relaxed query {table:TV, brand: 
Samsung ±0.2, type: LED ±Q.l, (iia(7ona/;50±0. 3}, which yields 
3 results in our example database. 

If at the end of having evaluated T relaxed queries and 
none is found to have an estimated number of matches of at 
least k, the last relaxed query (i.e., the one with the largest 
amount of relaxation) is returned. 

4.3.2 DP-Rewrite 

Drawing on ideas similar to the dynamic program for 
knapsack-style problems, we also consider a dynamic pro- 
gramming heuristic DP-Rewrite. For a query q = {ai : 
Vi,a2 : U2, • • • , fflm : Vm}, let 

F{j, d) = Maximum fraction of products satis- 
fying the relaxed query on attributes ai, . . . , 
with total relaxation X^i^i — 

Let e be a parameter to the heuristic that determines the 
step size, i.e., by what increment we increase the relaxation 
of an attribute. For each cell in F{-, ■), we need to consider 
one new relaxation. Therefore, for a given maximum number 
of relaxations T, we can consider only p = [^\ different 
values for each attribute. Hence, we compute F{j, d) using 
dynamic programming as described in Algorithm [T] 

The optimal solution is given by min^/ F(m, d') for which 
the value is at least The amount of relaxation for each 
attribute can be kept track of by an auxiliary table. 

Consider again the query q — {table:TV, brand: Samsung, 
type: LED, diagonal: 50}, the target number of results be 
3, and the maximum number of relaxations considered be 



Algorithm 1 Dynamic program for Query Rewrite Using 
Histograms, with step size e and p — [^J. 

for d •<— 0, e, 2e, . . . , min(pe, 1) do 



end for 

for j 2 to m do 

for d 4— 0, e, 2e, . . . , min(pe, j) do 

/)) 

end for 
end for 





Attr 1 (6) 


Attr 2 {t) 


Attr 3 (d) 


d 


F{l,d) 


F(2,d) 


F(3,d) 


0.0 


0.50 


0.50 * 0.40 = 0.20 


0.20 * 0.10 = 0.020 


0.1 


0.50 


0.50 * 0.80 = 0.40 


0.20 * 0.40 = 0.080 


0.2 


0.80 


0.50 * 0.80 = 0.40 


0.40 * 0.40 = 0.160 


0.3 


1.00 


0.80 * 0.80 = 0.64 


0.40 * 0.40 = 0.160 


0.4 


1.00 


1.00 * 0.80 = 0.80 


0.64 * 0.40 = 0.256 


0.5 


1.00 


1.00 * 0.80 = 0.80 


0.80 * 0.40 = 0.320 



Table 4: DP-Rewrite with e = 0.1, p = 15/3 = 5, and 
k — Sj i»G.j I p I — 0.3« 



T = 15. Let the step size e = 0.1, a sample execution of 
DP-Rewrite is illustrated in Table H) At termination, it 
returns the relaxed query {table:TV, brand:Samsung ±0.3, 
type:LED ±0.1, diagonal: 50±Q.1\, which yields 3 results in 
our example database. Note that, however, if T = 10, then 
p = 3, and hence the algorithm will only be able to evaluate 
up to -F(3, 0.3), and will fail to find a relaxation. 

Similar to Greedy-Rewrite, if no relaxed query with an 
estimated number of matches of at least k is found at the 
end of having evaluated T relaxed queries, the relaxed query 
with the largest amount of relaxation is returned. 

4.3.3 Trade-off Between the Heuristics 

There is a trade-off between the two heuristics described. 
On the one hand, for any fixed e, if the maximum number of 
relaxed queries allowed is large, DP-Rewrite is guaranteed 
to find a relaxed query q with tr(q) no larger than the one 
found by Greedy-RewriteQ However, when the number 
of relaxed queries allowed is small, DP-Rewrite will be able 
to investigate solutions of only small total amount of relax- 
ation, and fails to find a solution when Greedy-Rewrite 
may succeed. We explore this trade-off more fully in the 
experiments. 

5. EXPERIMENTAL EVALUATION 

In this section, we study the behavior and performance of 
our algorithms on effectively rewriting real user queries. 



^Note that this does not guarantee the results returned by 
DP-Rewrite is necessarily better than ones returned by 
Greedy-Rewrite when measured in the objective of Equa- 
tion ([2)| , since aggregate distance and total relaxation is not 
equivalent. 
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5.1 Experimental Setup 

For our experimental evaluation we built a prototype search 
engine and we populated it with real data from the shop- 
ping vertical of a commercial search engine. To this end, 
we downloaded the detailed descriptions for about 5 million 
products related to 73 categories about electronics (such as 
Televisions, Equalizers, GPS Receivers, etc.) from [2]. Each 
product is provided in structured form with its attributes 
clearly specified like [1] . We indexed the product details and 
computed the histograms for the attributes as described in 
Section m 

As our query set we used a random sample of one thousand 
queries of a major commercial search engine's log that were 
provided to us. We selected the queries that were directed 
to the categories described above and for which we extracted 
attribute value pairs to form the corresponding structured 
queries. We use well-known techniques |17II21| to extract the 
attribute information from the queries. The categorization 
and translation to structured queries was verified manually 
to be correct. 

We ran all thousand queries through our system and we se- 
lected the ones that triggered a query rewrite because they 
return too few (less than k) results. Out of the thousand 
queries, 343 would benefit from query rewrites. Since the 
queries were a random sample of queries targeted towards 
the structured data that we have available, on average ap- 
proximately 34% of such queries could potential benefit. In 
the remainder of this section, we use these 343 queries as 
our query set to evaluate in depth our techniques. 

5.1.1 Comparison Method 

As observed in the Introduction, for queries that trigger 
very few results, amazon. com rewrites the query by dropping 
words from the query. To take advantage of the semantics 
parser, instead of dropping words from the query, we im- 
plemented a version that removes attributes from the struc- 
tured interpretation of the query. The attribute to remove is 
selected based on which attribute is the most constraining. 
We compare our method to this approach which we termed 
Attribute-Removal. We present its performance in Sec- 
tion 15.31 Note that there is no parameter to tune for this 
algorithm. 

5.1.2 Distance Function 

Within our prototype search engine, we also implemented 
a distance function to be used for ranking and evaluating our 
results after query rewrite. As our aggregate distance func- 
tion ad{-) we considered the average distance of the query 
to the items in our data set. More specifically, for a given 
query q = {ai : vi,a2 : V2,...,am : «m} and an item p, 
ad{p,q) = ■^Y.idi{vi,Pai), where di{vi,pai) is the individ- 
ual distance between the q and p for attribute ai . 

One natural definition of distance (or similarity) between 
attribute values is based on the notion of substitutability, i.e., 
the likelihood of a user substituting her desired attribute 
value V (specified in the query) by eventually choosing a 
product with a different attribute value v' . For example, a 
user looking for a mkon digital camera is much more likely to 
substitute the brand for another well-recognized brand such 
as canon rather than an obscure one like yashica. Thus, the 
distance between nikon and canon is expected to be smaller 
than that between nikon and yashica. Similar intuition holds 
for a numerical attribute as well. Consider a user buying a 



32inch led tv. She is more likely to eventually buy a 36inch 
than a 60inch led tv. 

In our implementation, we define di as the normalized dis- 
tance of the two attribute values when they are numeric, i.e., 
di{vi,pai) ~ min(1.0, ^"'j^j''"' ^ )■ For categorical attributes, 
we compute this distance measure using a methodology sim- 
ilar to the one described in !T9^ based on browsed trails orig- 
inating from search engines. As these distances are based on 
search logs, certain attribute values appear very rarely, lead- 
ing to no estimate for certain pairs of attribute values. For 
example, for the attribute model, distances between pairs of 
model numbers could be missing. In such cases, we take the 
conservative position that the missing distances to be the 
maximum possible distance of 1. 

For our performance metric Mean-Dist, we will use the 
mean distance (as captured by ad{-)) over all items in our 
result set, i.e. we will use Equation ((2]). To penalize for 
the cases where the algorithm fails to find at least k re- 
sults, which could happen due to poor estimates that over- 
estimates the number of matches of a relaxed query, or an 
algorithm having attempted T different relaxed queries, we 
treat any shortfall as having retrieved documents that are 
at a maximum possible distance of 1 from the query. Under 
this penalty, an algorithm that finds a relaxed query that 
obtains at least k results will do better than one that does 
not. 

Finally, for the experiments presented in this section we 
set the number of returned results k = 10. 

5.2 Varying the Step Size 

We start our experimental evaluation by studying the ef- 
fect of the step size e in the performance of our query rewrite 
algorithms. Both Greedy-Rewrite and DP-Rewrite use 
a parameter e that determines the amount of relaxation of 
an attribute at a step of the algorithm. Intuitively, for small 
e, we are making smaller, more careful steps when relaxing 
so we expect that the furthest item will be quite close to 
the fc*'' item. On the other hand, if e is large, we are relax- 
ing more aggressively and we may identify significantly more 
than k, and thus our performance metric may be worse. 

To study this effect in more detail, we evaluated our al- 
gorithms over our data and we plot the graphs shown in 
Figure [5] for Greedy-Rewrite and in Figure [3] for DP- 
Rewrite. The results for Attribute-Removal is not af- 
fected by the step size e or the number of steps T. The data 
is shown in Figure!?] and is not shown in Figures [5] and [3] for 
presentation clarity. 

The algorithms were allowed upto a total of 20 steps, 
which ensured that they would consider rewrites that would 
return at least fc=10 results. The horizontal axis shows in- 
creasing values of e and the vertical axis shows the average 
Mean-Dist at a given e value. Lower values in the vertical 
axis indicate better performance. 

In the case of Greedy-Rewrite, we observe that increas- 
ing step sizes lead to a larger value under our performance 
metric, i.e., worse results. As our algorithms become more 
aggressive (increasing e) they allow for the result set to 
grow much larger than k and thus Mean-Dist increases. 
Of course, smaller e values imply better performance but at 
the cost of requiring more steps until completion. 

The picture for DP-Rewrite is more complicated. When 
the number of steps is very few, it faces a trade-off in choos- 
ing the step size. When the step size is small, DP-Rewrite 
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Figure 2: Effect of step size e to the distance of 
furthest result for Greedy-Rewrite. 



Figure 4: Average Mean-Dist after a given set of 
time steps for all the algorithms for step size e=0.1 




Figure 3: Effect of step size e to the distance of 
furthest result for DP-Rewrite. 



fails to find rewrites that retrieve at least k results, lead- 
ing to poor performance as it is penalized for the shortfall; 
when the step size is large, DP-Rewrite finds rewrites that 
obtains at least k results, but now with large total amount 
of relaxation across all attributs. Hence, we see a U-shaped 
curve for small number of steps. When the number of steps 
is large, the performance of DP-Rewrite is closer to mono- 
tonically increasing in step sizes, as relaxations of at least 
k results are found for any step sizes, and hence smaller 
step sizes lead to better performance. Indeed, we see that 
the three curves for number of steps = 12, 16, 20 overlaps 
one another, indicating that the same rewrite is found. The 
small dip from e = 0.2toe = 0.3is due to a couple of queries 
where the best relaxation is by rewriting an attribute to in- 
clude values that are 0.3 (and 0.9) away, whence for e — 0.2 
these attributes have to include values that are 0.4 (and 1.0) 
away. 

In both cases, we found that e = 0.1 gives a reasonable 
performance for our practical setting for the number of steps 
T > 2, so we will use this value for the remainder of our 
experiments. We also observe that, overall, DP-Rewrite 
performs better than Greedy-Rewrite because of the fact 
that it can keep a tab on the best rewrite among all the 
candidate rewrites it has explored for any given T. 

5.3 Varying Number of Steps 

We now turn to study the performance of our algorithms 
in terms of the amount of steps that is allocated to them. We 
fixed the step size to 0.1 and look at different step values. 



At a high level, we assume that, on average, each query 
rewrite estimation will take approximately the same time to 
be computed. To this end, we ran all three algorithms over 
our data set and we compared their performance which is 
shown in Figure [4] In the figure, the horizontal axis is the 
number of steps, and the vertical axis is the average Mean- 
DlST at a given number of steps. 

The first observation is that both Greedy-Rewrite and 
DP-Rewrite perform substantially better than Attribute- 
Removal. The second observation is that a larger of number 
steps does not necessarily translate to a better performance. 
This may appear counter-intuitive as one would assume that 
with more steps, the relaxation algorithm would get to "ex- 
plore" the attribute space more fully to arrive at the right at- 
tribute combinations to relax. For Greedy-Rewrite, how- 
ever, this needs not be the case. This is because in cases 
where the estimation routine underestimates the number of 
results, Greedy-Rewrite will continue to relax beyond the 
point necessary, leading to a set of results with higher Mean- 
DlST, whereas a run with fewer number of steps will termi- 
nate with a relaxed query that it returns due to exhaustion 
of number of steps but lucks out in being one that retrieves 
sufficient number of results, leading to lower Mean-Dist. 
Indeed, the performance of Greedy-Rewrite deteriorates 
after 10 steps since the additional relaxation of the attributes 
only results in adding more unrelated results to the result 
set. 

In contrast, for DP-Rewrite, increasing the number of 
steps leads to steady improvements in Mean-Dist. While 
in principle DP-Rewrite may be plagued by the aforemen- 
tioned problem for Greedy-Rewrite due to underestima- 
tion, because it explores the space of relaxed queries more 
completely, it is less affected by poor estimation compared 
to Greedy-Rewrite. Nonetheless, by around 12 steps, the 
quality of the results do not improve any further as it starts 
to find exactly the same relaxed query. 

Finally, Figure [5] summarizes the relative performance of 
all three algorithms for different values of the step size e 
fixing the number of steps T = 10. Again, we observe 
that both Greedy-Rewrite and DP-Rewrite outperform 
Attribute-Removal. 

5.4 Testing for Attribute Dependencies 

As we discussed in Section [l] one preprocessing step that 
we may apply to our algorithms is to identify attribute de- 
pendencies and drop dependent attributes from the query 
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Figure 5: Average Mean-Dist for different step sizes 
for all the algorithms for number of steps T — 10 



1.00 




step size (e) 



Figure 6: Average Mean-Dist when for two depen- 
dent attributes a — !> fe (a implies b), a is dropped and 
b is dropped for different values of e 



before rewriting it. Attribute (or functional) dependencies 
are very useful in optimizing queries in database systems as 
they can capture the relations between attributes. Our high- 
level intuition is that if attribute dependencies are present 
in our data set, it would help to take it into consideration 
as these dependencies point to dependence across attributes, 
hence accounting for them can help with estimation, which 
in turn helps to find better relaxed queries. To study the 
presence and effect of attribute dependencies to our algo- 
rithms, we computed the conditional probabilities for all 
pairs of attributes and we kept only those that were higher 
than 0.9. 

Given a pair of attributes a and b where P{a = v\b — w) 
for a large number of pairs of attribute values v and w, we 
need to decide whether we should drop attribute a or 6 from 
the query before relaxing. As we discussed in Section |4] the 
best choice depends on the distribution of values and the 
distance function for attribute a or b. For example, if a is 
more selective (i.e. appears in less tuples) than 6, it may be 
better to drop attribute a as it is expected to relax the query 
more than if we dropped b. On the other hand, dropping b 
may also help since tuples it appears in are already partially 
implied by a. 

To this end, we repeated the experiment for identifying a 
good step size with the attribute-dependency preprocessing 
enabled. We computed the results for both alternatives for 
dropping an attribute (that is, either a or 6). We report 
the results in Figure [6] for the Greedy- Rewrite algorithm 



Figure 7: The median of the number of results pro- 
cessed by the index using all three algorithms for 
different values of T and for step size e=0.1 

using small and large values of T (T = 2 and T = 10) 
respectively. The findings for the DP-Rewrite algorithm 
are similar. 

The overall result is surprising, as we find that either 
approach of incorporating attribute dependencies by drop- 
ping attribute a or 6 have not led to better performance, 
and in some cases even worse performances. To understand 
this better, we perform a query- by- query analysis of the re- 
sults, and found that the problem manifests itself due to 
a complex chain of interactions. First, a significant frac- 
tion of these queries are '<brand> <model> query'. The at- 
tribute dependencies we found are also between attribute 
brand and model, where each model is associated with a 
unique brand. As mentioned in Section 15.1.21 we do not 
have many distances estimated between models due to data 
sparsity. When the attribute model is dropped, we retrieve 
a number of different models of the same brand, all of which 
are considered to be quite far away from the query as we 
treat missing distances as 1. When the attribute brand is 
dropped, the situation is even worse as the algorithm will 
now relax the attribute model to close to distance 1 in order 
to find sufficient number of results due to missing distances. 
Hence, in such cases, performances are worse than not drop- 
ping attribute at all, as the results are now no longer con- 
strained by brand. 

5.5 Index Performance 

In another experiment, we measured the work done by the 
index in terms of the number of documents processed by the 
index. The processing done by the index typically includes 
computing ranking features and scoring the document for 
the given query. As the processing takes time, one would 
like the number of documents processed by the index close to 
the documents estimated by the rewrite algorithm. Figure [7] 
illustrates the performance of the algorithms in terms of 
processing done by the index for step size e=0.1. 

The general trend is that DP-Rewrite produces rewrites 
that give close to the desired number of results of fc = 10, 
and generates the least work for the index among the three 
algorithms. On the other extreme, Attribute-Removal 
produces rewrites that generate the most work for the in- 
dex due to its choice of removing the chosen attribute com- 
pletely. Greedy-Rewrite spans the performance gap be- 
tween these two algorithms. For lower values of T, it results 
in smaller number of documents to be included in the fil- 
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ter set while at the higher values of T, it comes close to 
Attribute- Removal in terms of the number of documents 
admitted into the fiher set. The reason for the behavior 
exhibited by Greedy-Rewrite is as follows. As Greedy- 
Rewrite explores one attribute at a time, and chooses its 
next step based on its current relaxed query, it performs a 
depth-first-like search through the space of relaxed queries. 
In many cases, due to its choice in prioritizing the relaxation 
in favor of the most selective attribute, it ends up repeatedly 
relaxing the same attribute leading to completely relaxing 
an attribute. These type of relaxed queries typically leads to 
retrieving significantly more number of results. Note how- 
ever that the result set may still have similar average quality 
as measured by Mean-Dist, as confirmed by the figures in 
the previous sections. 

6. CONCLUSION 

In this paper we propose a query-rewrite framework for 
answering structured web queries when users pose queries 
that would have led to very few results. Our framework 
takes into account the stringent time requirement of answer- 
ing web queries, and balances it with the need of retrieving 
results close to the user queries. We describe two approaches 
to solving this problem, and show experimentally that both 
solutions produce meaningful results given our constraints. 

After studying the performance of the three algorithms 
with respect to parameters like step size and the number of 
rewrites to explore, we conclude that if time envelope admits 
more rewrites, then DP-Rewrite is more applicable. In 
the case of extremely small latency restrictions, Greedy- 
Rewrite is a better choice. 

The approaches proposed in this paper is especially im- 
portant in domains where there is an underlying source of 
structured data, but for which users lacking domain exper- 
tise may end up issuing queries that have few or even zero 
matches. This contributes to the growing literature on how 
to efficiently surface structured results in response to web 
queries. 
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